Your browser doesn't support javascript.
loading
Show: 20 | 50 | 100
Results 1 - 20 de 904
Filter
1.
BMC Bioinformatics ; 25(1): 181, 2024 May 08.
Article in English | MEDLINE | ID: mdl-38720247

ABSTRACT

BACKGROUND: RNA sequencing combined with machine learning techniques has provided a modern approach to the molecular classification of cancer. Class predictors, reflecting the disease class, can be constructed for known tissue types using the gene expression measurements extracted from cancer patients. One challenge of current cancer predictors is that they often have suboptimal performance estimates when integrating molecular datasets generated from different labs. Often, the quality of the data is variable, procured differently, and contains unwanted noise hampering the ability of a predictive model to extract useful information. Data preprocessing methods can be applied in attempts to reduce these systematic variations and harmonize the datasets before they are used to build a machine learning model for resolving tissue of origins. RESULTS: We aimed to investigate the impact of data preprocessing steps-focusing on normalization, batch effect correction, and data scaling-through trial and comparison. Our goal was to improve the cross-study predictions of tissue of origin for common cancers on large-scale RNA-Seq datasets derived from thousands of patients and over a dozen tumor types. The results showed that the choice of data preprocessing operations affected the performance of the associated classifier models constructed for tissue of origin predictions in cancer. CONCLUSION: By using TCGA as a training set and applying data preprocessing methods, we demonstrated that batch effect correction improved performance measured by weighted F1-score in resolving tissue of origin against an independent GTEx test dataset. On the other hand, the use of data preprocessing operations worsened classification performance when the independent test dataset was aggregated from separate studies in ICGC and GEO. Therefore, based on our findings with these publicly available large-scale RNA-Seq datasets, the application of data preprocessing techniques to a machine learning pipeline is not always appropriate.


Subject(s)
Machine Learning , Neoplasms , RNA-Seq , Humans , RNA-Seq/methods , Neoplasms/genetics , Transcriptome/genetics , Sequence Analysis, RNA/methods , Gene Expression Profiling/methods , Computational Biology/methods
2.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38725155

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) experiments have become instrumental in developmental and differentiation studies, enabling the profiling of cells at a single or multiple time-points to uncover subtle variations in expression profiles reflecting underlying biological processes. Benchmarking studies have compared many of the computational methods used to reconstruct cellular dynamics; however, researchers still encounter challenges in their analysis due to uncertainty with respect to selecting the most appropriate methods and parameters. Even among universal data processing steps used by trajectory inference methods such as feature selection and dimension reduction, trajectory methods' performances are highly dataset-specific. To address these challenges, we developed Escort, a novel framework for evaluating a dataset's suitability for trajectory inference and quantifying trajectory properties influenced by analysis decisions. Escort evaluates the suitability of trajectory analysis and the combined effects of processing choices using trajectory-specific metrics. Escort navigates single-cell trajectory analysis through these data-driven assessments, reducing uncertainty and much of the decision burden inherent to trajectory inference analyses. Escort is implemented in an accessible R package and R/Shiny application, providing researchers with the necessary tools to make informed decisions during trajectory analysis and enabling new insights into dynamic biological processes at single-cell resolution.


Subject(s)
RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , RNA-Seq/methods , Humans , Computational Biology/methods , Sequence Analysis, RNA/methods , Software , Algorithms , Gene Expression Profiling/methods , Single-Cell Gene Expression Analysis
3.
Nat Commun ; 15(1): 3946, 2024 May 10.
Article in English | MEDLINE | ID: mdl-38729950

ABSTRACT

Disease modeling with isogenic Induced Pluripotent Stem Cell (iPSC)-differentiated organoids serves as a powerful technique for studying disease mechanisms. Multiplexed coculture is crucial to mitigate batch effects when studying the genetic effects of disease-causing variants in differentiated iPSCs or organoids, and demultiplexing at the single-cell level can be conveniently achieved by assessing natural genetic barcodes. Here, to enable cost-efficient time-series experimental designs via multiplexed bulk and single-cell RNA-seq of hybrids, we introduce a computational method in our Vireo Suite, Vireo-bulk, to effectively deconvolve pooled bulk RNA-seq data by genotype reference, and thereby quantify donor abundance over the course of differentiation and identify differentially expressed genes among donors. Furthermore, with multiplexed scRNA-seq and bulk RNA-seq, we demonstrate the usefulness and necessity of a pooled design to reveal donor iPSC line heterogeneity during macrophage cell differentiation and to model rare WT1 mutation-driven kidney disease with chimeric organoids. Our work provides an experimental and analytic pipeline for dissecting disease mechanisms with chimeric organoids.


Subject(s)
Cell Differentiation , Induced Pluripotent Stem Cells , Organoids , RNA-Seq , Single-Cell Analysis , Organoids/metabolism , Single-Cell Analysis/methods , Induced Pluripotent Stem Cells/metabolism , Induced Pluripotent Stem Cells/cytology , Humans , Cell Differentiation/genetics , RNA-Seq/methods , Sequence Analysis, RNA/methods , Macrophages/metabolism , Macrophages/cytology , Animals , Single-Cell Gene Expression Analysis
4.
Int J Mol Sci ; 25(9)2024 Apr 26.
Article in English | MEDLINE | ID: mdl-38731950

ABSTRACT

The periodontal ligament (PDL) is a highly specialized fibrous tissue comprising heterogeneous cell populations of an intricate nature. These complexities, along with challenges due to cell culture, impede a comprehensive understanding of periodontal pathophysiology. This study aims to address this gap, employing single-cell RNA sequencing (scRNA-seq) technology to analyze the genetic intricacies of PDL both in vivo and in vitro. Primary human PDL samples (n = 7) were split for direct in vivo analysis and cell culture under serum-containing and serum-free conditions. Cell hashing and sorting, scRNA-seq library preparation using the 10x Genomics protocol, and Illumina sequencing were conducted. Primary analysis was performed using Cellranger, with downstream analysis via the R packages Seurat and SCORPIUS. Seven distinct PDL cell clusters were identified comprising different cellular subsets, each characterized by unique genetic profiles, with some showing donor-specific patterns in representation and distribution. Formation of these cellular clusters was influenced by culture conditions, particularly serum presence. Furthermore, certain cell populations were found to be inherent to the PDL tissue, while others exhibited variability across donors. This study elucidates specific genes and cell clusters within the PDL, revealing both inherent and context-driven subpopulations. The impact of culture conditions-notably the presence of serum-on cell cluster formation highlights the critical need for refining culture protocols, as comprehending these influences can drive the creation of superior culture systems vital for advancing research in PDL biology and regenerative therapies. These discoveries not only deepen our comprehension of PDL biology but also open avenues for future investigations into uncovering underlying mechanisms.


Subject(s)
Periodontal Ligament , Single-Cell Analysis , Humans , Periodontal Ligament/cytology , Periodontal Ligament/metabolism , Single-Cell Analysis/methods , Cells, Cultured , RNA-Seq/methods , Sequence Analysis, RNA/methods , Male , Female , Gene Expression Profiling/methods , Adult , Transcriptome , Single-Cell Gene Expression Analysis
5.
BMC Genomics ; 25(1): 444, 2024 May 06.
Article in English | MEDLINE | ID: mdl-38711017

ABSTRACT

BACKGROUND: Normalization is a critical step in the analysis of single-cell RNA-sequencing (scRNA-seq) datasets. Its main goal is to make gene counts comparable within and between cells. To do so, normalization methods must account for technical and biological variability. Numerous normalization methods have been developed addressing different sources of dispersion and making specific assumptions about the count data. MAIN BODY: The selection of a normalization method has a direct impact on downstream analysis, for example differential gene expression and cluster identification. Thus, the objective of this review is to guide the reader in making an informed decision on the most appropriate normalization method to use. To this aim, we first give an overview of the different single cell sequencing platforms and methods commonly used including isolation and library preparation protocols. Next, we discuss the inherent sources of variability of scRNA-seq datasets. We describe the categories of normalization methods and include examples of each. We also delineate imputation and batch-effect correction methods. Furthermore, we describe data-driven metrics commonly used to evaluate the performance of normalization methods. We also discuss common scRNA-seq methods and toolkits used for integrated data analysis. CONCLUSIONS: According to the correction performed, normalization methods can be broadly classified as within and between-sample algorithms. Moreover, with respect to the mathematical model used, normalization methods can further be classified into: global scaling methods, generalized linear models, mixed methods, and machine learning-based methods. Each of these methods depict pros and cons and make different statistical assumptions. However, there is no better performing normalization method. Instead, metrics such as silhouette width, K-nearest neighbor batch-effect test, or Highly Variable Genes are recommended to assess the performance of normalization methods.


Subject(s)
Single-Cell Analysis , Single-Cell Analysis/methods , Humans , Gene Expression Profiling/methods , Gene Expression Profiling/standards , Sequence Analysis, RNA/methods , Transcriptome , Algorithms , RNA-Seq/methods , RNA-Seq/standards , Animals
6.
Methods Mol Biol ; 2808: 121-127, 2024.
Article in English | MEDLINE | ID: mdl-38743366

ABSTRACT

During the infection of a host cell by an infectious agent, a series of gene expression changes occurs as a consequence of host-pathogen interactions. Unraveling this complex interplay is the key for understanding of microbial virulence and host response pathways, thus providing the basis for new molecular insights into the mechanisms of pathogenesis and the corresponding immune response. Dual RNA sequencing (dual RNA-seq) has been developed to simultaneously determine pathogen and host transcriptomes enabling both differential and coexpression analyses between the two partners as well as genome characterization in the case of RNA viruses. Here, we provide a detailed laboratory protocol and bioinformatics analysis guidelines for dual RNA-seq experiments focusing on - but not restricted to - measles virus (MeV) as a pathogen of interest. The application of dual RNA-seq technologies in MeV-infected patients can potentially provide valuable information on the structure of the viral RNA genome and on cellular innate immune responses and drive the discovery of new targets for antiviral therapy.


Subject(s)
Genome, Viral , Host-Pathogen Interactions , Measles virus , Measles , RNA, Viral , Humans , Measles/virology , Measles/immunology , Measles/genetics , Measles virus/genetics , Measles virus/pathogenicity , RNA, Viral/genetics , Host-Pathogen Interactions/genetics , Host-Pathogen Interactions/immunology , Computational Biology/methods , Sequence Analysis, RNA/methods , RNA-Seq/methods , Transcriptome , Gene Expression Profiling/methods , High-Throughput Nucleotide Sequencing/methods
7.
Nat Commun ; 15(1): 4055, 2024 May 14.
Article in English | MEDLINE | ID: mdl-38744843

ABSTRACT

We introduce GRouNdGAN, a gene regulatory network (GRN)-guided reference-based causal implicit generative model for simulating single-cell RNA-seq data, in silico perturbation experiments, and benchmarking GRN inference methods. Through the imposition of a user-defined GRN in its architecture, GRouNdGAN simulates steady-state and transient-state single-cell datasets where genes are causally expressed under the control of their regulating transcription factors (TFs). Training on six experimental reference datasets, we show that our model captures non-linear TF-gene dependencies and preserves gene identities, cell trajectories, pseudo-time ordering, and technical and biological noise, with no user manipulation and only implicit parameterization. GRouNdGAN can synthesize cells under new conditions to perform in silico TF knockout experiments. Benchmarking various GRN inference algorithms reveals that GRouNdGAN effectively bridges the existing gap between simulated and biological data benchmarks of GRN inference algorithms, providing gold standard ground truth GRNs and realistic cells corresponding to the biological system of interest.


Subject(s)
Algorithms , Computer Simulation , Gene Regulatory Networks , RNA-Seq , Single-Cell Analysis , Single-Cell Analysis/methods , RNA-Seq/methods , Humans , Transcription Factors/metabolism , Transcription Factors/genetics , Computational Biology/methods , Benchmarking , Sequence Analysis, RNA/methods , Single-Cell Gene Expression Analysis
8.
Nat Commun ; 15(1): 4050, 2024 May 14.
Article in English | MEDLINE | ID: mdl-38744866

ABSTRACT

Although more than half of all genes generate transcripts that differ in 3'UTR length, current analysis pipelines only quantify the amount but not the length of mRNA transcripts. 3'UTR length is determined by 3' end cleavage sites (CS). We map CS in more than 200 primary human and mouse cell types and increase CS annotations relative to the GENCODE database by 40%. Approximately half of all CS are used in few cell types, revealing that most genes only have one or two major 3' ends. We incorporate the CS annotations into a computational pipeline, called scUTRquant, for rapid, accurate, and simultaneous quantification of gene and 3'UTR isoform expression from single-cell RNA sequencing (scRNA-seq) data. When applying scUTRquant to data from 474 cell types and 2134 perturbations, we discover extensive 3'UTR length changes across cell types that are as widespread and coordinately regulated as gene expression changes but affect mostly different genes. Our data indicate that mRNA abundance and mRNA length are two largely independent axes of gene regulation that together determine the amount and spatial organization of protein synthesis.


Subject(s)
3' Untranslated Regions , RNA, Messenger , Single-Cell Analysis , 3' Untranslated Regions/genetics , Humans , Animals , Mice , RNA, Messenger/genetics , RNA, Messenger/metabolism , Single-Cell Analysis/methods , Sequence Analysis, RNA/methods , Gene Expression Regulation , RNA-Seq/methods , Computational Biology/methods , Gene Expression Profiling/methods , Single-Cell Gene Expression Analysis
9.
Sci Rep ; 14(1): 10983, 2024 05 14.
Article in English | MEDLINE | ID: mdl-38744869

ABSTRACT

Parkinson's disease (PD) is a complex neurodegenerative disorder without a cure. The onset of PD symptoms corresponds to 50% loss of midbrain dopaminergic (mDA) neurons, limiting early-stage understanding of PD. To shed light on early PD development, we study time series scRNA-seq datasets of mDA neurons obtained from patient-derived induced pluripotent stem cell differentiation. We develop a new data integration method based on Non-negative Matrix Tri-Factorization that integrates these datasets with molecular interaction networks, producing condition-specific "gene embeddings". By mining these embeddings, we predict 193 PD-related genes that are largely supported (49.7%) in the literature and are specific to the investigated PINK1 mutation. Enrichment analysis in Kyoto Encyclopedia of Genes and Genomes pathways highlights 10 PD-related molecular mechanisms perturbed during early PD development. Finally, investigating the top 20 prioritized genes reveals 12 previously unrecognized genes associated with PD that represent interesting drug targets.


Subject(s)
Dopaminergic Neurons , Parkinson Disease , Parkinson Disease/genetics , Parkinson Disease/pathology , Humans , Dopaminergic Neurons/metabolism , Dopaminergic Neurons/pathology , RNA-Seq/methods , Induced Pluripotent Stem Cells/metabolism , Mesencephalon/metabolism , Mesencephalon/pathology , Gene Regulatory Networks , Mutation , Cell Differentiation/genetics , Multiomics , Single-Cell Gene Expression Analysis
10.
Sci Rep ; 14(1): 10940, 2024 05 13.
Article in English | MEDLINE | ID: mdl-38740888

ABSTRACT

Improving the baking quality is a primary challenge in the wheat flour production value chain, as baking quality represents a crucial factor in determining its overall value. In the present study, we conducted a comparative RNA-Seq analysis on the high baking quality mutant "O-64.1.10" genotype and its low baking quality wild type "Omid" cultivar to recognize potential genes associated with bread quality. The cDNA libraries were constructed from immature grains that were 15 days post-anthesis, with an average of 16.24 and 18.97 million paired-end short-read sequences in the mutant and wild-type, respectively. A total number of 733 transcripts with differential expression were identified, 585 genes up-regulated and 188 genes down-regulated in the "O-64.1.10" genotype compared to the "Omid". In addition, the families of HSF, bZIP, C2C2-Dof, B3-ARF, BES1, C3H, GRF, HB-HD-ZIP, PLATZ, MADS-MIKC, GARP-G2-like, NAC, OFP and TUB were appeared as the key transcription factors with specific expression in the "O-64.1.10" genotype. At the same time, pathways related to baking quality were identified through Kyoto Encyclopedia of Genes and Genomes. Collectively, we found that the endoplasmic network, metabolic pathways, secondary metabolite biosynthesis, hormone signaling pathway, B group vitamins, protein pathways, pathways associated with carbohydrate and fat metabolism, as well as the biosynthesis and metabolism of various amino acids, have a great deal of potential to play a significant role in the baking quality. Ultimately, the RNA-seq results were confirmed using quantitative Reverse Transcription PCR for some hub genes such as alpha-gliadin, low molecular weight glutenin subunit and terpene synthase (gibberellin) and as a resource for future study, 127 EST-SSR primers were generated using RNA-seq data.


Subject(s)
Gene Expression Profiling , Gene Expression Regulation, Plant , RNA-Seq , Triticum , Triticum/genetics , Triticum/growth & development , Triticum/metabolism , RNA-Seq/methods , Gene Expression Profiling/methods , Transcriptome , Edible Grain/genetics , Edible Grain/metabolism , Cooking , Bread , Plant Proteins/genetics , Plant Proteins/metabolism , Genotype , Flour
11.
Genet Res (Camb) ; 2024: 4285171, 2024.
Article in English | MEDLINE | ID: mdl-38715622

ABSTRACT

Bladder cancer has recently seen an alarming increase in global diagnoses, ascending as a predominant cause of cancer-related mortalities. Given this pressing scenario, there is a burgeoning need to identify effective biomarkers for both the diagnosis and therapeutic guidance of bladder cancer. This study focuses on evaluating the potential of high-definition computed tomography (CT) imagery coupled with RNA-sequencing analysis to accurately predict bladder tumor stages, utilizing deep residual networks. Data for this study, including CT images and RNA-Seq datasets for 82 high-grade bladder cancer patients, were sourced from the TCIA and TCGA databases. We employed Cox and lasso regression analyses to determine radiomics and gene signatures, leading to the identification of a three-factor radiomics signature and a four-gene signature in our bladder cancer cohort. ROC curve analyses underscored the strong predictive capacities of both these signatures. Furthermore, we formulated a nomogram integrating clinical features, radiomics, and gene signatures. This nomogram's AUC scores stood at 0.870, 0.873, and 0.971 for 1-year, 3-year, and 5-year predictions, respectively. Our model, leveraging radiomics and gene signatures, presents significant promise for enhancing diagnostic precision in bladder cancer prognosis, advocating for its clinical adoption.


Subject(s)
Neoplasm Staging , Neural Networks, Computer , Tomography, X-Ray Computed , Urinary Bladder Neoplasms , Urinary Bladder Neoplasms/genetics , Urinary Bladder Neoplasms/diagnostic imaging , Urinary Bladder Neoplasms/pathology , Humans , Tomography, X-Ray Computed/methods , Male , Female , RNA-Seq/methods , Aged , Nomograms , Middle Aged , Biomarkers, Tumor/genetics , ROC Curve , Prognosis , Transcriptome , Radiomics
12.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38701412

ABSTRACT

Trajectory inference is a crucial task in single-cell RNA-sequencing downstream analysis, which can reveal the dynamic processes of biological development, including cell differentiation. Dimensionality reduction is an important step in the trajectory inference process. However, most existing trajectory methods rely on cell features derived from traditional dimensionality reduction methods, such as principal component analysis and uniform manifold approximation and projection. These methods are not specifically designed for trajectory inference and fail to fully leverage prior information from upstream analysis, limiting their performance. Here, we introduce scCRT, a novel dimensionality reduction model for trajectory inference. In order to utilize prior information to learn accurate cells representation, scCRT integrates two feature learning components: a cell-level pairwise module and a cluster-level contrastive module. The cell-level module focuses on learning accurate cell representations in a reduced-dimensionality space while maintaining the cell-cell positional relationships in the original space. The cluster-level contrastive module uses prior cell state information to aggregate similar cells, preventing excessive dispersion in the low-dimensional space. Experimental findings from 54 real and 81 synthetic datasets, totaling 135 datasets, highlighted the superior performance of scCRT compared with commonly used trajectory inference methods. Additionally, an ablation study revealed that both cell-level and cluster-level modules enhance the model's ability to learn accurate cell features, facilitating cell lineage inference. The source code of scCRT is available at https://github.com/yuchen21-web/scCRT-for-scRNA-seq.


Subject(s)
Algorithms , Single-Cell Analysis , Single-Cell Analysis/methods , Humans , RNA-Seq/methods , Computational Biology/methods , Software , Sequence Analysis, RNA/methods , Animals , Single-Cell Gene Expression Analysis
13.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38701413

ABSTRACT

With the emergence of large amount of single-cell RNA sequencing (scRNA-seq) data, the exploration of computational methods has become critical in revealing biological mechanisms. Clustering is a representative for deciphering cellular heterogeneity embedded in scRNA-seq data. However, due to the diversity of datasets, none of the existing single-cell clustering methods shows overwhelming performance on all datasets. Weighted ensemble methods are proposed to integrate multiple results to improve heterogeneity analysis performance. These methods are usually weighted by considering the reliability of the base clustering results, ignoring the performance difference of the same base clustering on different cells. In this paper, we propose a high-order element-wise weighting strategy based self-representative ensemble learning framework: scEWE. By assigning different base clustering weights to individual cells, we construct and optimize the consensus matrix in a careful and exquisite way. In addition, we extracted the high-order information between cells, which enhanced the ability to represent the similarity relationship between cells. scEWE is experimentally shown to significantly outperform the state-of-the-art methods, which strongly demonstrates the effectiveness of the method and supports the potential applications in complex single-cell data analytical problems.


Subject(s)
Sequence Analysis, RNA , Single-Cell Analysis , Single-Cell Analysis/methods , Cluster Analysis , Sequence Analysis, RNA/methods , Algorithms , Computational Biology/methods , Humans , RNA-Seq/methods
14.
Brief Bioinform ; 25(3)2024 Mar 27.
Article in English | MEDLINE | ID: mdl-38706317

ABSTRACT

Single-cell RNA sequencing (scRNA-seq) enables the exploration of cellular heterogeneity by analyzing gene expression profiles in complex tissues. However, scRNA-seq data often suffer from technical noise, dropout events and sparsity, hindering downstream analyses. Although existing works attempt to mitigate these issues by utilizing graph structures for data denoising, they involve the risk of propagating noise and fall short of fully leveraging the inherent data relationships, relying mainly on one of cell-cell or gene-gene associations and graphs constructed by initial noisy data. To this end, this study presents single-cell bilevel feature propagation (scBFP), two-step graph-based feature propagation method. It initially imputes zero values using non-zero values, ensuring that the imputation process does not affect the non-zero values due to dropout. Subsequently, it denoises the entire dataset by leveraging gene-gene and cell-cell relationships in the respective steps. Extensive experimental results on scRNA-seq data demonstrate the effectiveness of scBFP in various downstream tasks, uncovering valuable biological insights.


Subject(s)
Sequence Analysis, RNA , Single-Cell Analysis , Single-Cell Analysis/methods , Sequence Analysis, RNA/methods , Humans , Algorithms , Gene Expression Profiling/methods , Computational Biology/methods , RNA-Seq/methods
15.
Dev Cell ; 59(9): 1210-1230.e9, 2024 May 06.
Article in English | MEDLINE | ID: mdl-38569548

ABSTRACT

The Drosophila larval ventral nerve cord (VNC) shares many similarities with the spinal cord of vertebrates and has emerged as a major model for understanding the development and function of motor systems. Here, we use high-quality scRNA-seq, validated by anatomical identification, to create a comprehensive census of larval VNC cell types. We show that the neural lineages that comprise the adult VNC are already defined, but quiescent, at the larval stage. Using fluorescence-activated cell sorting (FACS)-enriched populations, we separate all motor neuron bundles and link individual neuron clusters to morphologically characterized known subtypes. We discovered a glutamate receptor subunit required for basal neurotransmission and homeostasis at the larval neuromuscular junction. We describe larval glia and endorse the general view that glia perform consistent activities throughout development. This census represents an extensive resource and a powerful platform for future discoveries of cellular and molecular mechanisms in repair, regeneration, plasticity, homeostasis, and behavioral coordination.


Subject(s)
Drosophila melanogaster , Larva , Motor Neurons , Animals , Larva/genetics , Larva/metabolism , Motor Neurons/metabolism , Motor Neurons/cytology , Drosophila melanogaster/genetics , Drosophila melanogaster/metabolism , Neuroglia/metabolism , Neuroglia/cytology , Neuromuscular Junction/metabolism , Drosophila Proteins/metabolism , Drosophila Proteins/genetics , RNA-Seq/methods , Single-Cell Gene Expression Analysis
16.
Methods Mol Biol ; 2787: 225-243, 2024.
Article in English | MEDLINE | ID: mdl-38656493

ABSTRACT

Coffee, an important agricultural product for tropical producing countries, is facing challenges due to climate change, including periods of drought, irregular rain distribution, and high temperatures. These changes result in plant water stress, leading to significant losses in coffee productivity and quality. Understanding the processes that affect coffee flowering is crucial for improving productivity and quality. In this chapter, we describe a protocol for transcriptome analysis using available Internet software, mainly in the Galaxy Platform, using RNA-Seq data from flowers collected from different parts of the coffee tree. The methods presented in this chapter provide a comprehensive protocol for transcriptome analysis of differentially expressed genes from flowers of coffee plant. This knowledge can be utilized in coffee genetic improvement programs, particularly in the selection of cultivars that are tolerant to water deficit.


Subject(s)
Coffea , Flowers , Gene Expression Profiling , Gene Expression Regulation, Plant , Transcriptome , Flowers/genetics , Coffea/genetics , Gene Expression Profiling/methods , Transcriptome/genetics , Software , Computational Biology/methods , RNA-Seq/methods
17.
Bioinformatics ; 40(5)2024 May 02.
Article in English | MEDLINE | ID: mdl-38684178

ABSTRACT

MOTIVATION: Continuous advancements in single-cell RNA sequencing (scRNA-seq) technology have enabled researchers to further explore the study of cell heterogeneity, trajectory inference, identification of rare cell types, and neurology. Accurate scRNA-seq data clustering is crucial in single-cell sequencing data analysis. However, the high dimensionality, sparsity, and presence of "false" zero values in the data can pose challenges to clustering. Furthermore, current unsupervised clustering algorithms have not effectively leveraged prior biological knowledge, making cell clustering even more challenging. RESULTS: This study investigates a semisupervised clustering model called scTPC, which integrates the triplet constraint, pairwise constraint, and cross-entropy constraint based on deep learning. Specifically, the model begins by pretraining a denoising autoencoder based on a zero-inflated negative binomial distribution. Deep clustering is then performed in the learned latent feature space using triplet constraints and pairwise constraints generated from partial labeled cells. Finally, to address imbalanced cell-type datasets, a weighted cross-entropy loss is introduced to optimize the model. A series of experimental results on 10 real scRNA-seq datasets and five simulated datasets demonstrate that scTPC achieves accurate clustering with a well-designed framework. AVAILABILITY AND IMPLEMENTATION: scTPC is a Python-based algorithm, and the code is available from https://github.com/LF-Yang/Code or https://zenodo.org/records/10951780.


Subject(s)
Algorithms , Single-Cell Analysis , Single-Cell Analysis/methods , Cluster Analysis , Humans , Sequence Analysis, RNA/methods , RNA-Seq/methods , Deep Learning , Software , Single-Cell Gene Expression Analysis
18.
Am J Hum Genet ; 111(5): 841-862, 2024 May 02.
Article in English | MEDLINE | ID: mdl-38593811

ABSTRACT

RNA sequencing (RNA-seq) has recently been used in translational research settings to facilitate diagnoses of Mendelian disorders. A significant obstacle for clinical laboratories in adopting RNA-seq is the low or absent expression of a significant number of disease-associated genes/transcripts in clinically accessible samples. As this is especially problematic in neurological diseases, we developed a clinical diagnostic approach that enhanced the detection and evaluation of tissue-specific genes/transcripts through fibroblast-to-neuron cell transdifferentiation. The approach is designed specifically to suit clinical implementation, emphasizing simplicity, cost effectiveness, turnaround time, and reproducibility. For clinical validation, we generated induced neurons (iNeurons) from 71 individuals with primary neurological phenotypes recruited to the Undiagnosed Diseases Network. The overall diagnostic yield was 25.4%. Over a quarter of the diagnostic findings benefited from transdifferentiation and could not be achieved by fibroblast RNA-seq alone. This iNeuron transcriptomic approach can be effectively integrated into diagnostic whole-transcriptome evaluation of individuals with genetic disorders.


Subject(s)
Cell Transdifferentiation , Fibroblasts , Neurons , Sequence Analysis, RNA , Humans , Cell Transdifferentiation/genetics , Fibroblasts/metabolism , Fibroblasts/cytology , Sequence Analysis, RNA/methods , Neurons/metabolism , Neurons/cytology , Transcriptome , Reproducibility of Results , Nervous System Diseases/genetics , Nervous System Diseases/diagnosis , RNA-Seq/methods , Female , Male
20.
Bioinformatics ; 40(5)2024 May 02.
Article in English | MEDLINE | ID: mdl-38662553

ABSTRACT

SUMMARY: Existing clustering methods for characterizing cell populations from single-cell RNA sequencing are constrained by several limitations stemming from the fact that clusters often cannot be homogeneous, particularly for transitioning populations. On the other hand, dominant cell populations within samples can be identified independently by their strong gene co-expression signatures using methods unrelated to partitioning. Here, we introduce a clustering method, CASCC (co-expression-assisted single-cell clustering), designed to improve biological accuracy using gene co-expression features identified using an unsupervised adaptive attractor algorithm. CASCC outperformed other methods as evidenced by multiple evaluation metrics, and our results suggest that CASCC can improve the analysis of single-cell transcriptomics, enabling potential new discoveries related to underlying biological mechanisms. AVAILABILITY AND IMPLEMENTATION: The CASCC R package is publicly available at https://github.com/LingyiC/CASCC and https://zenodo.org/doi/10.5281/zenodo.10648327.


Subject(s)
Algorithms , RNA-Seq , Single-Cell Analysis , Software , Single-Cell Analysis/methods , Cluster Analysis , RNA-Seq/methods , Humans , Gene Expression Profiling/methods , Sequence Analysis, RNA/methods , Single-Cell Gene Expression Analysis
SELECTION OF CITATIONS
SEARCH DETAIL
...